Conversation
… rate-limit retries - Add tsx as dev dependency and update supervisor to prefer bun > tsx > npx - Detect LiteLLM by user-agent header (litellm/*) in addition to x-litellm-* headers - Force stream=false for all LiteLLM requests (healthchecks don't send x-litellm-* headers) - Increase MAX_CONCURRENT_SESSIONS default from 10 to 50 - Increase rate-limit retry attempts (2→3) and base delay (1s→2s) with exponential backoff - Allow rate-limit retry even after partial content was yielded - Add DEBUG_PROXY=true flag for detailed error diagnosis - Add prefersStreaming() to AgentAdapter interface (unused but available)
|
Thanks for putting this together — the LiteLLM adapter concept is solid and we want to get it in. However the PR bundles several unrelated changes that need to be separated before we can merge anything. What we're pulling out and merging separately: What we're not merging from this PR, and why:
Rate-limit retry after partial content — The original guard ("Never retry after response content was yielded — response is committed") was intentional. Removing it for rate-limit errors risks corrupting or duplicating an in-flight SSE stream. The client has already received partial events — resuming on the same connection isn't safe. This needs more careful thought as a standalone change.
We'll post the clean PR shortly and reference this one. |
Auto-detects LiteLLM requests via litellm/* User-Agent or x-litellm-* headers and routes them to a dedicated passthrough adapter. - adapters/passthrough.ts: LiteLLM adapter — usesPassthrough()=true, prefersStreaming()=false, x-litellm-session-id for session continuity, <env cwd=...> extraction, mcp__litellm__* tool naming - adapters/detect.ts: isLiteLLMRequest() detection, passthrough adapter wired in as priority 3 (after Droid and Crush, before OpenCode fallback) - adapter.ts: add optional prefersStreaming(body) to AgentAdapter interface - server.ts: move detectAdapter before stream determination; use adapter.prefersStreaming?.(body) to allow adapters to override stream setting (replaces the previous inline LiteLLM header duplication) - proxy-litellm-adapter.test.ts: 32 new tests covering adapter behaviour and detectAdapter routing - adapter-detection.test.ts: fix header() mock to handle no-arg call (isLiteLLMRequest calls header() with no args to inspect all headers) - README.md: LiteLLM setup section, tested agents table entry, passthrough.ts in architecture module map Closes #199. Based on original work in PR #201 by @endre82.
Auto-detects LiteLLM requests via litellm/* User-Agent or x-litellm-* headers and routes them to a dedicated passthrough adapter. - adapters/passthrough.ts: LiteLLM adapter — usesPassthrough()=true, prefersStreaming()=false, x-litellm-session-id for session continuity, <env cwd=...> extraction, mcp__litellm__* tool naming - adapters/detect.ts: isLiteLLMRequest() detection, passthrough adapter wired in as priority 3 (after Droid and Crush, before OpenCode fallback) - adapter.ts: add optional prefersStreaming(body) to AgentAdapter interface - server.ts: move detectAdapter before stream determination; use adapter.prefersStreaming?.(body) to allow adapters to override stream setting (replaces the previous inline LiteLLM header duplication) - proxy-litellm-adapter.test.ts: 32 new tests covering adapter behaviour and detectAdapter routing - adapter-detection.test.ts: fix header() mock to handle no-arg call (isLiteLLMRequest calls header() with no args to inspect all headers) - README.md: LiteLLM setup section, tested agents table entry, passthrough.ts in architecture module map Closes #199. Based on original work in PR #201 by @endre82.
#199
feat: add LiteLLM passthrough adapter with x-litellm-* header detection